Use tilized dram-interleaved as default input-output layout #1744

jnie-TT · 2025-01-10T04:59:54Z

Description

Part of the runtime stitching effort #1743.

This PR updates the default input/output layout to tiled dram-interleaved from system memory row-major.

Combined the runtime stitching APIs, this enables the user to pre-tilize and interleave tensors (such as weights) and reuse them over multiple programs, eliminating ping-ponging between host/dram, row-major/tile

IR Example

TTNN IR of simple_matmul test on main:

#system_memory = #ttnn.buffer_type<system_memory>
#ttnn_layout = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<64x128xbf16, #system_memory>>
#ttnn_layout1 = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<128x96xbf16, #system_memory>>
#ttnn_layout2 = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<64x96xbf16, #system_memory>>
#ttnn_layout3 = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<2x4x!tt.tile<32x32, bf16>, #dram>, <interleaved>>
#ttnn_layout4 = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<4x3x!tt.tile<32x32, bf16>, #dram>, <interleaved>>
#ttnn_layout5 = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<2x3x!tt.tile<32x32, bf16>, #dram>, <interleaved>>
module attributes {tt.device = #device, tt.system_desc = #system_desc} {
  func.func @forward(%arg0: tensor<64x128xbf16, #ttnn_layout>, %arg1: tensor<128x96xbf16, #ttnn_layout1>) -> tensor<64x96xbf16, #ttnn_layout2> {
    %0 = "ttnn.get_device"() <{mesh_shape = #ttnn<mesh_shape 1x1>}> : () -> !tt.device<#device>
    %1 = "ttnn.to_device"(%arg0, %0) <{memory_config = #ttnn.memory_config<#dram, <<2x4>>, <interleaved>>}> : (tensor<64x128xbf16, #ttnn_layout>, !tt.device<#device>) -> tensor<64x128xbf16, #ttnn_layout3>
    %2 = "ttnn.to_layout"(%1) <{layout = #ttnn.layout<tile>}> : (tensor<64x128xbf16, #ttnn_layout3>) -> tensor<64x128xbf16, #ttnn_layout3>
    "ttnn.deallocate"(%1) <{force = false}> : (tensor<64x128xbf16, #ttnn_layout3>) -> ()
    %3 = "ttnn.to_device"(%arg1, %0) <{memory_config = #ttnn.memory_config<#dram, <<4x3>>, <interleaved>>}> : (tensor<128x96xbf16, #ttnn_layout1>, !tt.device<#device>) -> tensor<128x96xbf16, #ttnn_layout4>
    %4 = "ttnn.to_layout"(%3) <{layout = #ttnn.layout<tile>}> : (tensor<128x96xbf16, #ttnn_layout4>) -> tensor<128x96xbf16, #ttnn_layout4>
    "ttnn.deallocate"(%3) <{force = false}> : (tensor<128x96xbf16, #ttnn_layout4>) -> ()
    %5 = "ttnn.empty"(%0) <{dtype = #tt.supportedDataTypes<bf16>, layout = #ttnn.layout<tile>, memory_config = #ttnn.memory_config<#dram, <<2x3>>, <interleaved>>, shape = #ttnn.shape<64x96>}> : (!tt.device<#device>) -> tensor<64x96xbf16, #ttnn_layout5>
    %6 = "ttnn.matmul"(%2, %4, %5) : (tensor<64x128xbf16, #ttnn_layout3>, tensor<128x96xbf16, #ttnn_layout4>, tensor<64x96xbf16, #ttnn_layout5>) -> tensor<64x96xbf16, #ttnn_layout5>
    "ttnn.deallocate"(%4) <{force = false}> : (tensor<128x96xbf16, #ttnn_layout4>) -> ()
    "ttnn.deallocate"(%2) <{force = false}> : (tensor<64x128xbf16, #ttnn_layout3>) -> ()
    %7 = "ttnn.from_device"(%6) : (tensor<64x96xbf16, #ttnn_layout5>) -> tensor<64x96xbf16, #ttnn_layout2>
    "ttnn.deallocate"(%5) <{force = false}> : (tensor<64x96xbf16, #ttnn_layout5>) -> ()
    %8 = "ttnn.to_layout"(%7) <{layout = #ttnn.layout<row_major>}> : (tensor<64x96xbf16, #ttnn_layout2>) -> tensor<64x96xbf16, #ttnn_layout2>
    "ttnn.deallocate"(%7) <{force = false}> : (tensor<64x96xbf16, #ttnn_layout2>) -> ()
    return %8 : tensor<64x96xbf16, #ttnn_layout2>
  }
}

TTNN IR of simple_matmul test after this change:

#ttnn_layout = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<2x4x!tt.tile<32x32, bf16>, #dram>, <interleaved>>
#ttnn_layout1 = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<4x3x!tt.tile<32x32, bf16>, #dram>, <interleaved>>
#ttnn_layout2 = #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<2x3x!tt.tile<32x32, bf16>, #dram>, <interleaved>>
module attributes {tt.device = #device, tt.system_desc = #system_desc} {
  func.func @forward(%arg0: tensor<64x128xbf16, #ttnn_layout>, %arg1: tensor<128x96xbf16, #ttnn_layout1>) -> tensor<64x96xbf16, #ttnn_layout2> {
    %0 = "ttnn.get_device"() <{mesh_shape = #ttnn<mesh_shape 1x1>}> : () -> !tt.device<#device>
    %1 = "ttnn.empty"(%0) <{dtype = #tt.supportedDataTypes<bf16>, layout = #ttnn.layout<tile>, memory_config = #ttnn.memory_config<#dram, <<2x3>>, <interleaved>>, shape = #ttnn.shape<64x96>}> : (!tt.device<#device>) -> tensor<64x96xbf16, #ttnn_layout2>
    %2 = "ttnn.matmul"(%arg0, %arg1, %1) : (tensor<64x128xbf16, #ttnn_layout>, tensor<128x96xbf16, #ttnn_layout1>, tensor<64x96xbf16, #ttnn_layout2>) -> tensor<64x96xbf16, #ttnn_layout2>
    return %2 : tensor<64x96xbf16, #ttnn_layout2>
  }
}

Changes

TTNNLayout

Updated the default memory space to dram, tensor memory layout to interleaved, and layout to tiled.
Moved force row major logic from the TTIRtoTTNN pass to this pass. This will determine whether or not to untilize the tensor. The issue with having the force row major logic in a downstream pass was that a toLayoutOp may not even be created in the first place, since the input is already defaulted to tile (thus no tilization would be needed).

TTIRToTTNN

Uplifted force row major logic to TTNNLayout Pass.

Optimizer

Added a workaround that moves GetDeviceOps to the front of the op schedule.
- Hit an issue where GetDeviceOps were non-deterministically moved to the end of the schedule when running mnist_sharded test
- I'll create a follow up issue for this to be properly fixed
Added a workaround that checks for ReturnOps in L1 usage calculation
- Return ops were not considered when calculating L1 usage. This was fine before because we would always have a to_layout op at the end before returning, but now we could very likely return directly without any layout conversion.
- I'll create a follow up issue for this to be properly fixed
Marked layout-forcing tests as XFail.
- With this change it seems like the layout-forcing tests return incorrect results.
- Thus marking these tests as XFail for now, I'll create a follow up issue for this to be properly fixed

Runtime

Added a workaround for runtime APIs to assume first device in the device mesh when sending tensors to device.
- Currently there's no device attribute in TTNNLayoutAttr, and therefore runtime can't know which device the tensor belongs to. This workaround configures runtime to always assume the tensor belongs to the first device (device id 0) in the mesh.
- Next task in-line is to add the device attribute to TTNNLayoutAttr. Once that's done we can remove the workaround.

MLIR Tests

Updated file-checks to adapt to new IR (e.g. removed anything that checked ttnn.to_device, redundant ttnn.to_layout etc.)
Expanded simple_eltwise to individual files.
- Using a large file made it hard to isolate errors. This also matches what we're doing in Dialect and Perf.
- Allows more complex/diverse testing per-op.

TODOs Before Merging

Frontends need to add a runtime::toHost call before memcpying tensors.
- This is because tensors are now returned in tile layout - runtime::toHost accepts an untilize flag that will untilize the tensor.
Update TODO comments once proper issues are created (optimizer, runtime workaround).

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

runtime/lib/ttnn/include/tt/runtime/ttnn/utils.cpp

nsmithtt

Reviewed the compiler portion, will take a look at the runtime part later today!

lib/Conversion/TTIRToTTNN/TTIRToTTNN.cpp

nsmithtt · 2025-01-14T16:51:29Z

lib/Dialect/TTNN/Analysis/BFInterleavedPolicy.cpp

+      currentL1Usage -= currentL1UsagePerOp[op].l1MemUsagePerUser;
+      currentL1UsagePerOp.erase(op);
+    }
+


@fbajraktariTT, can you review this file?

FYI @odjuricicTT, as @fbajraktariTT completed internship recently.

@jnie-TT I'm not sure that this extra logic is needed. Was a test failing without this temp fix? If so, can you provide more details?

@odjuricicTT there's an assert below that checks if the currentL1Usage is 0. This error only surfaces with my changes - it's fine without my changes because we always untilize (to_layout) before returning. However it's possible now with my changes that we will return directly without any intermediate ops between the current op and the return op, and this causes issues because we wouldn't have zeroed out currentL1Usage.

Since this function doesn't decrement l1 usage on return op, the assert will fire and say that the l1 usage is non 0. My change basically adds a check that if the consumer op is a return op, we decrement the l1 usage.

@jnie-TT Thanks! Your solution is fine for now. Just please file the followup issue and reference it in the comment.

lib/Dialect/TTNN/Transforms/Optimizer.cpp

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

nsmithtt · 2025-01-14T16:57:43Z

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

+
+    // TTNN Reshape does not support implicit tilization/untilization
+    // Therefore input output layouts should be the same
+    if (mlir::isa<ttir::ReshapeOp>(operation) && operandNumber == 1) {


I feel like we should have attributes on the op that denote these kind of capabilities instead of having this code be special cased for a specific op. @sdjordjevicTT thoughts?

Perhaps we should add an interface to all TTNN ops called shouldTilize that defaults to true and that ops can specialize.

Yeah that would be awesome to have, I know a lot of eltwise ops are facing a similar issue regarding data type, where some ops can typecast implicitly whereas some ops cannot. This results in the IR being misaligned with the actual runtime output.

I am thinking about this scenarios, do we have some examples?

Aren't these examples? i.e. reshape, conv2d, slice & embedding, or do you mean something else?

@sdjordjevicTT if you mean the implicit typecast ops an example would be relational binary ops vs unary ops .
Relational operations take an output_dtype that we setting to typecast implicitly within the op:

template <BinaryOpType binary_op_type> struct RelationalBinary { static Tensor invoke( uint8_t queue_id, const Tensor &input_tensor_a_arg, const Tensor &input_tensor_b_arg, const std::optional<const DataType> &output_dtype = std::nullopt, const std::optional<MemoryConfig> &memory_config = std::nullopt, std::optional<Tensor> optional_output_tensor = std::nullopt, std::optional<unary::FusedActivations> activations = std::nullopt, std::optional<unary::UnaryWithParam> input_tensor_a_activation = std::nullopt);

However unary ops do not:

template <UnaryOpType... unary_op_types> Tensor ExecuteUnary<unary_op_types...>::invoke( const Tensor& input_tensor, const std::optional<MemoryConfig>& memory_config, const std::optional<Tensor>& optional_output_tensor) {

And our compiler doesn't distinguish between them, i.e. for unary ops it'll still assume the output tensor of the unary op is properly typecasted to the desired data type.

As for ops that don't support implicit tilization/untilization, some examples include reshape, concat, transpose.

I believe there was a misunderstanding between us. :)

Regarding Conv, Slice, and Embedding, I'm aware that they require some inputs to be in a row-major layout. I'll address this by implementing the necessary layout workarounds. If the Metal developers decide not to support tile layout for them, then we can introduce a trait\interface to accommodate them.

Regarding the implicit conversions, I get it for the data_type, but how we are specifying whether the output is in tile\row major? By defining the optional_output_tensor? I see what can be the issue, if you have some row-major input, you want to keep it row-major output for such ops. We can think about adding the interface on an op level to support this.

I created issues on myself to follow up on this:

[TTNN] Consider creating shouldTilize interface on TTNN op level #1862

[TTNN] Consider creating supportImplicitTilizing interface on TTNN op level #1863

nsmithtt · 2025-01-14T16:58:34Z

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

+    if (mlir::isa<ttir::Conv2dOp>(operation) ||
+        mlir::isa<ttir::SliceOp>(operation) ||
+        (mlir::isa<ttir::EmbeddingBackwardOp>(operation) &&
+         operandNumber < 2)) {


Same as above.

This will be cleaned up with the workarounds. We have tasks for each of these to cleanup.

runtime/include/tt/runtime/detail/workarounds.h

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

sdjordjevicTT · 2025-01-15T14:14:54Z

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

+    if (mlir::isa<ttir::Conv2dOp>(operation) ||
+        mlir::isa<ttir::SliceOp>(operation) ||
+        (mlir::isa<ttir::EmbeddingBackwardOp>(operation) &&
+         operandNumber < 2)) {


This will be cleaned up with the workarounds. We have tasks for each of these to cleanup.

sdjordjevicTT · 2025-01-15T14:15:37Z

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

+
+    // TTNN Reshape does not support implicit tilization/untilization
+    // Therefore input output layouts should be the same
+    if (mlir::isa<ttir::ReshapeOp>(operation) && operandNumber == 1) {


I am thinking about this scenarios, do we have some examples?

lib/Dialect/TTNN/Transforms/Workarounds/TTNNWorkarounds.cpp

odjuricicTT

A few comments on Optimizer related changes, but looks good overall.

Requesting changes until optimizer layout overrides are fixed. I'll help with this.

test/ttmlir/Dialect/TTNN/optimizer/input_layout_loc_override.mlir

nsmithtt · 2025-01-16T19:08:16Z

runtime/include/tt/runtime/detail/workarounds.h

+  // API can determine the correct devices. Enabling this workaround will assume
+  // that a device tensor will reside in the L1/Dram of the first device (device
+  // id 0) of the device grid. This should be removed once we add the device
+  // grid information to the tensorDesc.


So there is strategy on LayoutDesc that will be set to ::tt::target::DistributedTensorConfig::NONE for single chip setup. Or it will be set to some kind of multi-device distribution if set. LMK if this doesn't resolve this issue.

table LayoutDesc { stride: [int]; oob_val: OOBVal; core_range_set: [Dim2dRange]; memory_desc: MemoryDesc; strategy: DistributionStrategy; }

Reach out to @wooseokTT if you need help/interpreting its programming.

@nsmithtt the strategy doesn't tell us which submesh a tensor belongs to though right? I remember that when I added it, it was used to specify the tensor distribution method across multi device (replicate, shard etc.).

I can use it to distinguish between single/multichip, but I don't know the mesh shape or mesh offset that the tensor is mapped to if it's multidevice. And I need this info if I want to move a tensor to a multidevice mesh in the toLayout API.

It depends on the strategy, but for e.g. ShardTensor2D.shard_mesh does tell you the mesh shape. I think it's always inferred that the offset is implicitly [0, 0], @wooseokTT feel free to correct me if I'm wrong, but this is reflected from TTNN API which doesn't support arbitrary mesh offsets.

Yeah you're right ShardTensor2D has the shard_mesh. But seems like the other ones don't... If all we're using is ShardTensor2D and offset 0, 0 then I guess I can just derive it from that. And maybe add an assert that checks the strategy must be ShardTensor2D. Does doing it this way make sense with how we're performing multichip operations?

runtime/lib/ttnn/operations/include/tt/runtime/ttnn/operations/utils.h

pilkicTT

@jnie-TT I'm preparing for this change in tt-forge-fe, so i've tested this branch. The only issue i've observed is not related directly to your change, but it has been exposed by it.

For tilized tensors we can have cases when FEs will get wrong stride information (when trying to allocate buffers for them).

The problem occurs when serializing layout into the flatbuffer. And its due to the fact that we have tied stride calculation to the layout attribute, but as it is currently implemented, same layout attributes can produce different strides (depending on the logical shape of the tensor). So, we end up with a problem when serializing into the flatbuffer due to the way the caching mechanism works there (all tensors with the same layout attribute will be serialized exactly the same).

For e.g., all tensors with layout: #ttnn.ttnn_layout<(d0, d1) -> (d0, d1), <1x1>, memref<1x1x!tt.tile<32x32, bf16>, #dram>, <interleaved>> will end up having same strides. Even though they can have different logical shapes.

Note:
I am not sure if getting stride for tilized tensors even makes sense, so we might want to introduce a different mechanism.

nsmithtt · 2025-01-27T15:07:40Z

The problem occurs when serializing layout into the flatbuffer. And its due to the fact that we have tied stride calculation to the layout attribute, but as it is currently implemented, same layout attributes can produce different strides (depending on the logical shape of the tensor).

Yikes!! That's a good catch. It seems our options are:

Copy tensor shape to layout in the IR to properly cache, kinda nasty to have it in multiple places
Symbolically represent stride in the LayoutDesc, runtime now has to calculate it
Move stride to the tensor and disable caching for flatbuffer tensor objects.

Open to additional ideas/alternatives

nsmithtt · 2025-01-29T02:58:12Z

@pilkicTT, what is the fallout of the stride caching bug?

odjuricicTT

Optimizer changes look good.

pilkicTT · 2025-01-29T18:11:47Z

@pilkicTT, what is the fallout of the stride caching bug?

I don't have the numbers, but for now we can work around it by assuming "uniform" stride and calculating it from the logical shapes. Additionally, regardless of this issue, we need to untilize the output tensors before memcpy as well. I have a change with the stride workaround and untilizing, and am testing it...

We should probably check other frontends and see if they are affected by these problems as well.

nsmithtt · 2025-01-29T18:18:46Z

@pilkicTT, what is the fallout of the stride caching bug?

I don't have the numbers, but for now we can work around it by assuming "uniform" stride and calculating it from the logical shapes. Additionally, regardless of this issue, we need to untilize the output tensors before memcpy as well. I have a change with the stride workaround and untilizing, and am testing it...

We should probably check other frontends and see if they are affected by these problems as well.

Maybe we should just get in a proper fix as a separate PR first. This seems like an innocuous and nasty bug that would suck to rediscover during the interim period.

pilkicTT · 2025-01-29T18:40:39Z

@pilkicTT, what is the fallout of the stride caching bug?

I don't have the numbers, but for now we can work around it by assuming "uniform" stride and calculating it from the logical shapes. Additionally, regardless of this issue, we need to untilize the output tensors before memcpy as well. I have a change with the stride workaround and untilizing, and am testing it...
We should probably check other frontends and see if they are affected by these problems as well.

Maybe we should just get in a proper fix as a separate PR first. This seems like an innocuous and nasty bug that would suck to rediscover during the interim period.

Let's do that, that would be great!

include/ttmlir/Dialect/TTNN/IR/TTNNOps.td

sdjordjevicTT

Thanks for pushing this change! A couple of comments inline, but nothing blocking.

lib/Dialect/TTNN/IR/TTNNWorkarounds.cpp

sdjordjevicTT · 2025-02-04T15:47:30Z

lib/Dialect/TTNN/Transforms/TTNNDecomposeLayouts.cpp

    auto device = op.getDevice();
-    assert((device || output.isOnHost()) &&
-           "Op device must be set for output tensors on device");
+    if (not device and not output.isOnHost()) {


Nit: Per coding style guidelines we decided not to use alternative tokens(not, and, or, etc...):
https://docs.tenstorrent.com/tt-mlir/coding-guidelines.html#using-alternative-tokens-and-or-xor-etc

Can you please switch to regular logical symbols? I have an ongoing PR, that fixes this file:
#2055

sdjordjevicTT · 2025-02-04T15:51:26Z

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp

        }
        continue;
      }

+      // TTNN mesh shard expects host input and output
+      // TODO(jnie): This can be removed once the workaround pass can correctly


Can you please open an issue for me regarding this and how I can repro it? I would like to take a look, currently, there is a canonicalization pass after the workaround pass, so I can take a look at it.

Issue created here: #2102

kmabeeTT

Thanks, Jackson!

### Ticket Fixes #233 ### Problem description With the change tenstorrent/tt-mlir#1744 on tt-mlir main, the output tensor layout is now dram-interleaved, tiled by default. ### What's changed Prior to memcpying the runtime tensor to host memory, we need to call the `tt::runtime::toHost` API to untilize the tensor. ### Checklist - [x] New/Existing tests provide coverage for changes

Before we used to perform conversion from mlir type to tt type in different places in the code. #1744 added type conversion in TTNNLayout which converts all types to tt supported types which makes this workaround redundant.

jnie-TT requested review from odjuricicTT, tt-mpantic, sdjordjevicTT, nobradovictt, tapspatel, nsmithtt, kmabeeTT, AleksKnezevic, pilkicTT, svuckovicTT, mtopalovicTT, jserbedzijaTT, azecevicTT and mrakitaTT as code owners January 10, 2025 04:59

jnie-TT mentioned this pull request Jan 10, 2025

Runtime Stitching Progress #1743

Open

8 tasks

sdjordjevicTT reviewed Jan 10, 2025

View reviewed changes

lib/Dialect/TTNN/Transforms/TTNNLayout.cpp Outdated Show resolved Hide resolved

sdjordjevicTT reviewed Jan 10, 2025

View reviewed changes

runtime/lib/ttnn/include/tt/runtime/ttnn/utils.cpp Outdated Show resolved Hide resolved

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch 3 times, most recently from 1a31acb to d4c5383 Compare January 14, 2025 03:57

nsmithtt reviewed Jan 14, 2025

View reviewed changes

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch from d4c5383 to 676c714 Compare January 14, 2025 19:59

sdjordjevicTT reviewed Jan 15, 2025

View reviewed changes

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch from 676c714 to a9a8eff Compare January 15, 2025 22:09

odjuricicTT requested changes Jan 16, 2025

View reviewed changes

test/ttmlir/Dialect/TTNN/optimizer/input_layout_loc_override.mlir Show resolved Hide resolved

This was referenced Jan 16, 2025

[TTNN] Consider creating shouldTilize interface on TTNN op level #1862

Open

[TTNN] Consider creating supportImplicitTilizing interface on TTNN op level #1863

Open

nsmithtt reviewed Jan 16, 2025

View reviewed changes

pilkicTT reviewed Jan 27, 2025

View reviewed changes

odjuricicTT approved these changes Jan 29, 2025

View reviewed changes

jnie-TT mentioned this pull request Jan 30, 2025

Stride calculation for different layouts #2045

Open

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch 2 times, most recently from 03dfc27 to 7434d7d Compare February 3, 2025 19:30

mtopalovicTT reviewed Feb 4, 2025

View reviewed changes

include/ttmlir/Dialect/TTNN/IR/TTNNOps.td Outdated Show resolved Hide resolved

sdjordjevicTT approved these changes Feb 4, 2025

View reviewed changes

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch from 7434d7d to 1d02669 Compare February 4, 2025 20:57

mtopalovicTT approved these changes Feb 4, 2025

View reviewed changes

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch from 1d02669 to c5e1cca Compare February 4, 2025 21:47

kmabeeTT approved these changes Feb 4, 2025

View reviewed changes

This was referenced Feb 4, 2025

[Optimizer] ReturnOp not accounted for when calculating L1 Usage #2101

Open

[Workaround Pass] Redundant toLayout ops are not removed #2102

Closed

[Workaround Pass] Host tensor tensorMemoryLayout not handled properly #2103

Open

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch from c5e1cca to d2234eb Compare February 4, 2025 22:25

Use tilized dram-interleaved as default input-output layout

fcc00ee

jnie-TT force-pushed the jnie/dram_interleaved_tiled_default_rebased branch from d2234eb to fcc00ee Compare February 4, 2025 22:28

jnie-TT merged commit db1544a into main Feb 5, 2025
27 of 29 checks passed

jnie-TT deleted the jnie/dram_interleaved_tiled_default_rebased branch February 5, 2025 00:02

mrakitaTT mentioned this pull request Feb 7, 2025

Fix uplift issues with tilized output tenstorrent/tt-xla#247

Merged

1 task

mtopalovicTT mentioned this pull request Feb 20, 2025

Remove workaround for to layout in decompose layouts #2221

Merged

azecevicTT mentioned this pull request Feb 21, 2025

Rewriting elementwise binary ops in TTNN dialect to be non-dps #2233

Merged

1 task

Use tilized dram-interleaved as default input-output layout #1744

Use tilized dram-interleaved as default input-output layout #1744

Conversation

jnie-TT commented Jan 10, 2025 • edited Loading

Description

IR Example

Changes

TTNNLayout

TTIRToTTNN

Optimizer

Runtime

MLIR Tests

TODOs Before Merging

nsmithtt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnie-TT Jan 15, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

odjuricicTT left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnie-TT Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnie-TT Jan 16, 2025 • edited Loading

Choose a reason for hiding this comment

pilkicTT left a comment

Choose a reason for hiding this comment

nsmithtt commented Jan 27, 2025

nsmithtt commented Jan 29, 2025

odjuricicTT left a comment

Choose a reason for hiding this comment

pilkicTT commented Jan 29, 2025

nsmithtt commented Jan 29, 2025

pilkicTT commented Jan 29, 2025

sdjordjevicTT left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kmabeeTT left a comment

Choose a reason for hiding this comment

jnie-TT commented Jan 10, 2025 •

edited

Loading

jnie-TT Jan 15, 2025 •

edited

Loading

jnie-TT Jan 16, 2025 •

edited

Loading

jnie-TT Jan 16, 2025 •

edited

Loading

sdjordjevicTT left a comment •

edited

Loading